Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech

نویسندگان

  • Amr H. Nour-Eldin
  • Peter Kabal
چکیده

We present a novel MFCC-based scheme for the Bandwidth Extension (BWE) of narrowband speech. BWE is based on the assumption that narrowband speech (0.3–3.4 kHz) correlates closely with the highband signal (3.4–7 kHz), enabling estimation of the highband frequency content given the narrow band. While BWE schemes have traditionally used LP-based parametrizations, our recent work has shown that MFCC parametrization results in higher correlation between both bands reaching twice that using LSFs. By employing high-resolution IDCT of highband MFCCs obtained from narrowband MFCCs by statistical estimation, we achieve highquality highband power spectra from which the time-domain speech signal can be reconstructed. Implementing this scheme for BWE translates the higher correlation advantage of MFCCs into BWE performance superior to that obtained using LSFs, as shown by improvements in log-spectral distortion as well as Itakura-based measures (the latter improving by up to 13%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Bandwidth Extension of Noise-co

We present a new bandwidth extension algorithm for converting narrowband telephone speech into wideband speech using a transformation in the mel cepstral domain. Unlike previous approaches, the proposed method is designed specifically for bandwidth extension of narrowband speech that has been corrupted by environmental noise. We show that by exploiting previous research in mel cepstrum feature ...

متن کامل

On the Relevance of Band for Speaker Verif

In this paper, we consider the effect of a bandwidth extension of narrow-band speech signals (0.3-3.4 kHz) to 0.3-8 kHz on speaker verification. Using covariance matrix based verification systems together with detection error trade-off curves, we compare the performance between systems operating on narrowband, wide-band (0-8 kHz), and bandwidth-extended speech. The experiments were conducted us...

متن کامل

Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing

Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are d...

متن کامل

Improving the filter bank of a classic speech feature extraction algorithm

The most popular speech feature extractor used in automatic speech recognition (ASR) systems today is the mel frequency cepstral coefficient (mfcc) algorithm. Introduced in 1980, the filter bank-based algorithm eventually replaced linear prediction cepstral coefficients (lpcc) as the premier front end, primarily because of mfcc’s superior robustness to additive noise. However, mfcc does not app...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008